A Mutual Information-based Assessment of Reverse Engineering on Rewards of Reinforcement Learning
نویسندگان
چکیده
Rewards are critical hyperparameters in reinforcement learning (RL), since most cases different reward values will lead to greatly performance. Due their commercial value, RL rewards become the target of reverse engineering by inverse (IRL) algorithm family. Existing efforts typically utilize two metrics measure IRL performance: expected value difference and mean loss, which we call them EVD MRL respectively. Unfortunately, some cases, can give completely opposite results, due focusing on whole state-space while only considering partly sampled rewards. Such situation naturally rises one fundamental question: whether current assessment sufficient accurate for more general use. Thus, this paper, based metric called normalized mutual information clusters (C-NMI) propose a novel assessment; aim fill research gap middle-granularity state space between entire specific sampling space. We agglomerative nesting (AGNES) control dynamical C-NMI computing via 4-order tensor model with injected manipulated trajectories. With such model, uniformly capture different-dimension MRL, EVD, C-NMI, perform comprehensive analyses. Extensive experiments several mainstream IRLs experimented Object World, hence revealing that assessing accuracy our method increases 110.13% 116.59% respectively when compared MRL. Meanwhile, is robust than under demonstrations.
منابع مشابه
the impact of training on second language writing assessment: a case of raters’ biasedness
چکیده هدف اول این تحقیق بررسی تأثیر آموزش مصحح بر آموزش گیرندگان براساس پایایی نمره های آنها در پنج بخش شامل محتوا ، سازمان ، لغت ، زبان و مکانیک بود. هدف دوم این بود که بدانیم آیا تفاوتهای بین آموزشی گیرندگان زن و مرد در پایایی نمرات آنها وجود دارد. برای بررسی این موارد ، ما 90 دانشجو در سطح میانه (متوسط) که از طریق تست تعیین سطح شده بودند انتخاب شدند. بعد از آنها خواستیم که درباره دو موضوع ا...
15 صفحه اولthe effect of explicit teaching of metacognitive vocabulary learning strategies on recall and retention of idioms
چکیده ندارد.
15 صفحه اولthe effect of lexically based language teaching (lblt) on vocabulary learning among iranian pre-university students
هدف پژوهش حاضر بررسی تاثیر روش تدریس واژگانی (واژه-محور) بر یادگیری لغات در بین دانش آموزان دوره پیش دانشگاهی است. بدین منظور دو گروه از دانش آموزان دوره پیش دانشگاهی (شصت نفر) که در سال تحصیلی 1389 در شهرستان نور آباد استان لرستان مشغول به تحصیل بودند انتخاب شده و به صورت قراردادی گروه آزمایش و گواه در نظر گرفته شدند. در ابتدا به منظور اطمینان یافتن از میزان همگن بودن دو گروه از دانش واژگان، آ...
15 صفحه اولOn Classification of Bivariate Distributions Based on Mutual Information
Among all measures of independence between random variables, mutual information is the only one that is based on information theory. Mutual information takes into account of all kinds of dependencies between variables, i.e., both the linear and non-linear dependencies. In this paper we have classified some well-known bivariate distributions into two classes of distributions based on their mutua...
متن کاملthe impact of portfolio assessment on iranian efl students essay writing: a process-oriented approach
this study was conducted to investigate the impact of portfolio assessment as a process-oriented assessment mechanism on iranian efl students’ english writing and its subskills of focus, elaboration, organization, conventions, and vocabulary. out of ninety juniors majoring in english literature and translation at the university of isfahan, sixty one of them who were at the same level of writing...
15 صفحه اولذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE transactions on artificial intelligence
سال: 2022
ISSN: ['2691-4581']
DOI: https://doi.org/10.1109/tai.2022.3190811